<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<html>
<head>
</head>
<body>

<p>Provides classes for working with
the <a href="http://boston.lti.cs.cmu.edu/clueweb09/wiki/tiki-index.php?page=ClueWeb09%20Wiki">ClueWeb09</a>
collection.  The dataset consists of one billion web pages (5 TB
compressed, 25 TB uncompressed), in ten languages, collected in
January and February 2009.  Its creation, supported by U.S. National
Science Foundation (NSF), was led
by <a href="http://www.cs.cmu.edu/~callan/">Jamie Callan</a> of the
<a href="http://www.lti.cs.cmu.edu/">Language Technologies
Institute</a> at <a href="http://www.cmu.edu/index.shtml">Carnegie
Mellon University</a> to support research on information retrieval and
related human language technologies. </p>

</body>
</html>
